Goto

Collaborating Authors

 input channel


Bilinear Attention Networks

Neural Information Processing Systems

Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively. However, the computational cost to learn attention distributions for every pair of multimodal input channels is prohibitively expensive. To solve this problem, co-attention builds two separate attention distributions for each modality neglecting the interaction between multimodal inputs. In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly. BAN considers bilinear interactions among two groups of input channels, while low-rank bilinear pooling extracts the joint representations for each pair of channels. Furthermore, we propose a variant of multimodal residual networks to exploit eight-attention maps of the BAN efficiently. We quantitatively and qualitatively evaluate our model on visual question answering (VQA 2.0) and Flickr30k Entities datasets, showing that BAN significantly outperforms previous methods and achieves new state-of-the-arts on both datasets.


Switching control of underactuated multi-channel systems with input constraints for cooperative manipulation

Lee, Dongjae, Dimarogonas, Dimos V., Kim, H. Jin

arXiv.org Artificial Intelligence

Abstract--This work presents an event-triggered switching control framework for a class of nonlinear underactuated multi-channel systems with input constraints. These systems are inspired by cooperative manipulation tasks involving underactua-tion, where multiple underactuated agents collaboratively push or pull an object to a target pose. T o simultaneously account for channel assignment, input constraints, and stabilization, we formulate the control problem as a Mixed Integer Linear Programming and derive sufficient conditions for its feasibility. T o improve real-time computation efficiency, we introduce an event-triggered control scheme that maintains stability even between switching events through a quadratic programming-based stabilizing controller . We theoretically establish the semi-global exponential stability of the proposed method and the asymptotic stability of its extension to nonprehensile cooperative manipulation under noninstantaneous switching. The proposed framework is further validated through numerical simulations on 2D and 3D free-flyer systems and multi-robot nonprehensile pushing tasks. Cooperative tasks involving objects that are collectively controlled by multiple agents such as drone swarms and robotic arms in manufacturing rely on precise object manipulation.


Bilinear Attention Networks

Neural Information Processing Systems

Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively. However, the computational cost to learn attention distributions for every pair of multimodal input channels is prohibitively expensive. To solve this problem, co-attention builds two separate attention distributions for each modality neglecting the interaction between multimodal inputs. In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly. BAN considers bilinear interactions among two groups of input channels, while low-rank bilinear pooling extracts the joint representations for each pair of channels. Furthermore, we propose a variant of multimodal residual networks to exploit eight-attention maps of the BAN efficiently. We quantitatively and qualitatively evaluate our model on visual question answering (VQA 2.0) and Flickr30k Entities datasets, showing that BAN significantly outperforms previous methods and achieves new state-of-the-arts on both datasets.




A Appendix

Neural Information Processing Systems

All CPU experiments are conducted on A WS C5.9xlarge instances with Intel Xeon Platinum 8124M Take TensorCore GPUs as an example. MetaSchedule makes an orthogonal contribution as it is a probabilistic language for composable search space construction rather than speeding up tuning. From frontend frameworks, for example, TensorFlow, PyTorch, or JAX, the tensor program to be optimized is generated from their computational graph. A.7 A vailable Transformations Primitives 17 Transformation Explanation split Split a loop into a sequence of consecutive loops fuse Fuse a sequence of consecutive loops into one reorder Reorder a sequence of loops parallel Parallelize a loop across CPU cores vectorize V ectorize a loop with SIMD unroll Unroll a loop bind Bind a loop to a GPU thread cache-read Create a block that reads a buffer region into a read cache cache-write Create a block that writes a buffer region into a write cache compute-at Move a producer block under the specific loop compute-inline Inline a block into its consumer(s) rfactor Factorize an associative reduction block by the specified loop storage-align Set alignment requirement for specific dimension of a buffer set-scope Set the storage scope of a buffer add-unit-loop Create a new unit loop on top of the specific block re-index


Channel Gating Neural Networks

Weizhe Hua, Yuan Zhou, Christopher M. De Sa, Zhiru Zhang, G. Edward Suh

Neural Information Processing Systems

Unlike static network pruning, channel gating optimizes CNN inference at run-time by exploiting input-specific characteristics, which allows substantially reducing the compute cost with almost no accuracy loss. We experimentally show that applying channel gating in state-of-the-art networks achieves 2.7-8.0


Appendices A Bernoulli-CRS Properties Let us define K R

Neural Information Processing Systems

First, we show that the above holds in expectation: Proposition 1. E null A Its expectation is controlled through the parameter k: Proposition 2. E [T ] = k . Let us further derive the properties of the proposed sampling algorithm. For notation simplicity, we assume zero padding. This formulation immediately hints at the possibility to sample over the input channel dimension, similarly to sampling column-row pairs in matrices. Figure 2 illustrates the sampling operation.


Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction Appendix

Neural Information Processing Systems

In the following, we provide the Appendix as part of the supplementary material to the main paper. Section C contains additional content about the model zoos. We also provide visualizations of some of the properties of our model zoo for better intuition. Consider a common, fully-connected feed-forward neural network (FFN). Training of neural networks is defined as an optimization against a objective function on a given dataset, i.e. their weights and biases are chosen to minimize a cost function, usually called loss, denoted by Subsequent earlier layer's error are computed with δ L, (6) where β is a positive learning rate.


Few-Shot Audio-Visual Learning of Environment Acoustics Supplementary Material

Neural Information Processing Systems

Moreover, we qualitatively demonstrate our model's prediction quality by Please use headphones to hear the spatial audio correctly. As we can see, the prediction error tends to be small when the source is relatively close to the receiver, or there are no major obstacles along the path connecting them. We show two scenes and two examples per scene. For our experiment with ambient environment sounds (Sec. We will publish the link to our datasets on our project page. Here, we provide our architecture and additional training details for reproducibility.